Overview

Brought to you by YData

Dataset statistics

Number of variables16
Number of observations1915886
Missing cells175989
Missing cells (%)0.6%
Duplicate rows558
Duplicate rows (%)< 0.1%
Total size in memory233.9 MiB
Average record size in memory128.0 B

Variable types

DateTime1
Categorical2
Text5
Unsupported3
Numeric5

Alerts

Dataset has 558 (< 0.1%) duplicate rowsDuplicates
ARR_DELAY is highly overall correlated with CANCELLED and 1 other fieldsHigh correlation
CANCELLED is highly overall correlated with ARR_DELAYHigh correlation
DEP_DELAY is highly overall correlated with ARR_DELAYHigh correlation
CANCELLED is highly imbalanced (82.1%) Imbalance
DEP_DELAY has 50351 (2.6%) missing values Missing
ARR_DELAY has 55991 (2.9%) missing values Missing
AIR_TIME has 56551 (3.0%) missing values Missing
OP_CARRIER_FL_NUM is an unsupported type, check if it needs cleaning or further analysis Unsupported
AIR_TIME is an unsupported type, check if it needs cleaning or further analysis Unsupported
DISTANCE is an unsupported type, check if it needs cleaning or further analysis Unsupported
DEP_DELAY has 91189 (4.8%) zeros Zeros
ARR_DELAY has 34815 (1.8%) zeros Zeros

Reproduction

Analysis started2025-04-30 23:10:06.862629
Analysis finished2025-04-30 23:11:20.834598
Duration1 minute and 13.97 seconds
Software versionydata-profiling vv4.16.1
Download configurationconfig.json

Variables

Distinct90
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.6 MiB
Minimum2019-01-01 00:00:00
Maximum2019-03-31 00:00:00
Invalid dates0
Invalid dates (%)0.0%
2025-04-30T19:11:21.013086image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:11:21.247883image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

OP_CARRIER
Categorical

Distinct26
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.6 MiB
WN
330295 
AA
232973 
DL
225391 
OO
195141 
UA
144328 
Other values (21)
787758 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters3831772
Distinct characters28
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWN
2nd rowWN
3rd rowWN
4th rowWN
5th rowWN

Common Values

ValueCountFrequency (%)
WN 330295
17.2%
AA 232973
12.2%
DL 225391
11.8%
OO 195141
10.2%
UA 144328
 
7.5%
YX 77168
 
4.0%
MQ 75780
 
4.0%
B6 72788
 
3.8%
OH 68825
 
3.6%
AS 61476
 
3.2%
Other values (16) 431721
22.5%

Length

2025-04-30T19:11:21.482742image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
wn 330295
17.2%
aa 232973
12.2%
dl 225391
11.8%
oo 195141
10.2%
ua 144328
 
7.5%
yx 77168
 
4.0%
mq 75780
 
4.0%
b6 72788
 
3.8%
oh 68825
 
3.6%
as 61476
 
3.2%
Other values (16) 431721
22.5%

Most occurring characters

ValueCountFrequency (%)
A 710100
18.5%
O 459107
12.0%
N 376355
 
9.8%
W 356282
 
9.3%
D 225391
 
5.9%
L 225391
 
5.9%
U 144328
 
3.8%
Y 130867
 
3.4%
X 124463
 
3.2%
Q 104414
 
2.7%
Other values (18) 975074
25.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 3611939
94.3%
Decimal Number 219833
 
5.7%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 710100
19.7%
O 459107
12.7%
N 376355
10.4%
W 356282
9.9%
D 225391
 
6.2%
L 225391
 
6.2%
U 144328
 
4.0%
Y 130867
 
3.6%
X 124463
 
3.4%
Q 104414
 
2.9%
Other values (13) 755241
20.9%
Decimal Number
ValueCountFrequency (%)
9 89147
40.6%
6 72788
33.1%
4 24294
 
11.1%
7 21221
 
9.7%
5 12383
 
5.6%

Most occurring scripts

ValueCountFrequency (%)
Latin 3611939
94.3%
Common 219833
 
5.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 710100
19.7%
O 459107
12.7%
N 376355
10.4%
W 356282
9.9%
D 225391
 
6.2%
L 225391
 
6.2%
U 144328
 
4.0%
Y 130867
 
3.6%
X 124463
 
3.4%
Q 104414
 
2.9%
Other values (13) 755241
20.9%
Common
ValueCountFrequency (%)
9 89147
40.6%
6 72788
33.1%
4 24294
 
11.1%
7 21221
 
9.7%
5 12383
 
5.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3831772
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 710100
18.5%
O 459107
12.0%
N 376355
 
9.8%
W 356282
 
9.3%
D 225391
 
5.9%
L 225391
 
5.9%
U 144328
 
3.8%
Y 130867
 
3.4%
X 124463
 
3.2%
Q 104414
 
2.7%
Other values (18) 975074
25.4%
Distinct6032
Distinct (%)0.3%
Missing12156
Missing (%)0.6%
Memory size14.6 MiB
2025-04-30T19:11:21.848165image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length6
Median length6
Mean length5.984959
Min length3

Characters and Unicode

Total characters11393746
Distinct characters34
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique16 ?
Unique (%)< 0.1%

Sample

1st rowN955WN
2nd rowN8686A
3rd rowN201LV
4th rowN413WN
5th rowN7832A
ValueCountFrequency (%)
n485ha 928
 
< 0.1%
n483ha 882
 
< 0.1%
n479ha 882
 
< 0.1%
n480ha 875
 
< 0.1%
n491ha 873
 
< 0.1%
n478ha 873
 
< 0.1%
n488ha 865
 
< 0.1%
n492ha 852
 
< 0.1%
n486ha 830
 
< 0.1%
n481ha 822
 
< 0.1%
Other values (6022) 1895048
99.5%
2025-04-30T19:11:22.540097image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
N 2452769
21.5%
8 707327
 
6.2%
9 670104
 
5.9%
2 655270
 
5.8%
1 646872
 
5.7%
3 639856
 
5.6%
7 637062
 
5.6%
6 630969
 
5.5%
4 618458
 
5.4%
5 596772
 
5.2%
Other values (24) 3138287
27.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 6233123
54.7%
Uppercase Letter 5160623
45.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 2452769
47.5%
A 476263
 
9.2%
W 354473
 
6.9%
S 281640
 
5.5%
D 142991
 
2.8%
U 136019
 
2.6%
J 134311
 
2.6%
E 131976
 
2.6%
B 115776
 
2.2%
K 108109
 
2.1%
Other values (14) 826296
 
16.0%
Decimal Number
ValueCountFrequency (%)
8 707327
11.3%
9 670104
10.8%
2 655270
10.5%
1 646872
10.4%
3 639856
10.3%
7 637062
10.2%
6 630969
10.1%
4 618458
9.9%
5 596772
9.6%
0 430433
6.9%

Most occurring scripts

ValueCountFrequency (%)
Common 6233123
54.7%
Latin 5160623
45.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 2452769
47.5%
A 476263
 
9.2%
W 354473
 
6.9%
S 281640
 
5.5%
D 142991
 
2.8%
U 136019
 
2.6%
J 134311
 
2.6%
E 131976
 
2.6%
B 115776
 
2.2%
K 108109
 
2.1%
Other values (14) 826296
 
16.0%
Common
ValueCountFrequency (%)
8 707327
11.3%
9 670104
10.8%
2 655270
10.5%
1 646872
10.4%
3 639856
10.3%
7 637062
10.2%
6 630969
10.1%
4 618458
9.9%
5 596772
9.6%
0 430433
6.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11393746
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 2452769
21.5%
8 707327
 
6.2%
9 670104
 
5.9%
2 655270
 
5.8%
1 646872
 
5.7%
3 639856
 
5.6%
7 637062
 
5.6%
6 630969
 
5.5%
4 618458
 
5.4%
5 596772
 
5.2%
Other values (24) 3138287
27.5%

OP_CARRIER_FL_NUM
Unsupported

Rejected  Unsupported 

Missing0
Missing (%)0.0%
Memory size14.6 MiB

ORIGIN_AIRPORT_ID
Real number (ℝ)

Distinct361
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12688.148
Minimum10135
Maximum16218
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.6 MiB
2025-04-30T19:11:22.764676image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum10135
5-th percentile10397
Q111292
median12889
Q314057
95-th percentile14893
Maximum16218
Range6083
Interquartile range (IQR)2765

Descriptive statistics

Standard deviation1521.9003
Coefficient of variation (CV)0.11994661
Kurtosis-1.3144165
Mean12688.148
Median Absolute Deviation (MAD)1456
Skewness0.063539771
Sum2.4309045 × 1010
Variance2316180.6
MonotonicityNot monotonic
2025-04-30T19:11:22.964927image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10397 93763
 
4.9%
13930 92227
 
4.8%
11298 69960
 
3.7%
11292 64616
 
3.4%
11057 61579
 
3.2%
12892 60182
 
3.1%
14107 45568
 
2.4%
12266 43910
 
2.3%
14747 43823
 
2.3%
14771 41890
 
2.2%
Other values (351) 1298368
67.8%
ValueCountFrequency (%)
10135 1310
 
0.1%
10136 490
 
< 0.1%
10140 6251
0.3%
10141 180
 
< 0.1%
10146 250
 
< 0.1%
10155 341
 
< 0.1%
10157 365
 
< 0.1%
10158 870
 
< 0.1%
10165 26
 
< 0.1%
10170 154
 
< 0.1%
ValueCountFrequency (%)
16218 375
 
< 0.1%
16101 328
 
< 0.1%
15991 178
 
< 0.1%
15919 3484
0.2%
15841 180
 
< 0.1%
15624 1614
 
0.1%
15607 253
 
< 0.1%
15582 153
 
< 0.1%
15454 158
 
< 0.1%
15412 4707
0.2%

ORIGIN
Text

Distinct361
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.6 MiB
2025-04-30T19:11:23.400600image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters5747658
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRSW
2nd rowRSW
3rd rowRSW
4th rowRSW
5th rowRSW
ValueCountFrequency (%)
atl 93763
 
4.9%
ord 92227
 
4.8%
dfw 69960
 
3.7%
den 64616
 
3.4%
clt 61579
 
3.2%
lax 60182
 
3.1%
phx 45568
 
2.4%
iah 43910
 
2.3%
sea 43823
 
2.3%
sfo 41890
 
2.2%
Other values (351) 1298368
67.8%
2025-04-30T19:11:24.008654image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
A 636257
 
11.1%
L 537540
 
9.4%
S 471225
 
8.2%
D 455861
 
7.9%
T 321423
 
5.6%
O 305142
 
5.3%
C 292061
 
5.1%
M 250272
 
4.4%
R 234630
 
4.1%
P 233127
 
4.1%
Other values (16) 2010120
35.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 5747658
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 636257
 
11.1%
L 537540
 
9.4%
S 471225
 
8.2%
D 455861
 
7.9%
T 321423
 
5.6%
O 305142
 
5.3%
C 292061
 
5.1%
M 250272
 
4.4%
R 234630
 
4.1%
P 233127
 
4.1%
Other values (16) 2010120
35.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5747658
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 636257
 
11.1%
L 537540
 
9.4%
S 471225
 
8.2%
D 455861
 
7.9%
T 321423
 
5.6%
O 305142
 
5.3%
C 292061
 
5.1%
M 250272
 
4.4%
R 234630
 
4.1%
P 233127
 
4.1%
Other values (16) 2010120
35.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5747658
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 636257
 
11.1%
L 537540
 
9.4%
S 471225
 
8.2%
D 455861
 
7.9%
T 321423
 
5.6%
O 305142
 
5.3%
C 292061
 
5.1%
M 250272
 
4.4%
R 234630
 
4.1%
P 233127
 
4.1%
Other values (16) 2010120
35.0%
Distinct355
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.6 MiB
2025-04-30T19:11:24.366474image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length34
Median length29
Mean length13.127786
Min length8

Characters and Unicode

Total characters25151342
Distinct characters57
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFort Myers, FL
2nd rowFort Myers, FL
3rd rowFort Myers, FL
4th rowFort Myers, FL
5th rowFort Myers, FL
ValueCountFrequency (%)
ca 208407
 
4.7%
tx 191353
 
4.3%
fl 154819
 
3.5%
il 116787
 
2.6%
chicago 111550
 
2.5%
ny 100614
 
2.3%
san 100434
 
2.3%
ga 100109
 
2.3%
atlanta 93763
 
2.1%
nc 88964
 
2.0%
Other values (432) 3178709
71.5%
2025-04-30T19:11:24.950500image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2529623
 
10.1%
, 1915886
 
7.6%
a 1891495
 
7.5%
o 1418940
 
5.6%
e 1327363
 
5.3%
n 1237686
 
4.9%
t 1212999
 
4.8%
l 1105059
 
4.4%
i 994357
 
4.0%
r 893295
 
3.6%
Other values (47) 10624639
42.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 14043408
55.8%
Uppercase Letter 6501852
25.9%
Space Separator 2529623
 
10.1%
Other Punctuation 2075056
 
8.3%
Dash Punctuation 1403
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1891495
13.5%
o 1418940
10.1%
e 1327363
9.5%
n 1237686
8.8%
t 1212999
8.6%
l 1105059
7.9%
i 994357
7.1%
r 893295
 
6.4%
s 871783
 
6.2%
h 608724
 
4.3%
Other values (16) 2481707
17.7%
Uppercase Letter
ValueCountFrequency (%)
A 805425
 
12.4%
C 751389
 
11.6%
N 543421
 
8.4%
L 507774
 
7.8%
D 345067
 
5.3%
M 333237
 
5.1%
F 325764
 
5.0%
T 308995
 
4.8%
S 289864
 
4.5%
O 274468
 
4.2%
Other values (16) 2016448
31.0%
Other Punctuation
ValueCountFrequency (%)
, 1915886
92.3%
/ 137792
 
6.6%
. 21378
 
1.0%
Space Separator
ValueCountFrequency (%)
2529623
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1403
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 20545260
81.7%
Common 4606082
 
18.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 1891495
 
9.2%
o 1418940
 
6.9%
e 1327363
 
6.5%
n 1237686
 
6.0%
t 1212999
 
5.9%
l 1105059
 
5.4%
i 994357
 
4.8%
r 893295
 
4.3%
s 871783
 
4.2%
A 805425
 
3.9%
Other values (42) 8786858
42.8%
Common
ValueCountFrequency (%)
2529623
54.9%
, 1915886
41.6%
/ 137792
 
3.0%
. 21378
 
0.5%
- 1403
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 25151342
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2529623
 
10.1%
, 1915886
 
7.6%
a 1891495
 
7.5%
o 1418940
 
5.6%
e 1327363
 
5.3%
n 1237686
 
4.9%
t 1212999
 
4.8%
l 1105059
 
4.4%
i 994357
 
4.0%
r 893295
 
3.6%
Other values (47) 10624639
42.2%

DEST_AIRPORT_ID
Real number (ℝ)

Distinct361
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12689.267
Minimum10135
Maximum16218
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.6 MiB
2025-04-30T19:11:25.160458image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum10135
5-th percentile10397
Q111292
median12889
Q314057
95-th percentile14893
Maximum16218
Range6083
Interquartile range (IQR)2765

Descriptive statistics

Standard deviation1521.2486
Coefficient of variation (CV)0.11988467
Kurtosis-1.3148134
Mean12689.267
Median Absolute Deviation (MAD)1456
Skewness0.061895338
Sum2.4311188 × 1010
Variance2314197.2
MonotonicityNot monotonic
2025-04-30T19:11:25.360282image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10397 93635
 
4.9%
13930 93278
 
4.9%
11298 69580
 
3.6%
11292 64625
 
3.4%
11057 61946
 
3.2%
12892 60305
 
3.1%
14107 45644
 
2.4%
12266 43890
 
2.3%
14747 43825
 
2.3%
14771 42540
 
2.2%
Other values (351) 1296618
67.7%
ValueCountFrequency (%)
10135 1311
 
0.1%
10136 490
 
< 0.1%
10140 5952
0.3%
10141 181
 
< 0.1%
10146 250
 
< 0.1%
10155 341
 
< 0.1%
10157 366
 
< 0.1%
10158 870
 
< 0.1%
10165 26
 
< 0.1%
10170 154
 
< 0.1%
ValueCountFrequency (%)
16218 376
 
< 0.1%
16101 328
 
< 0.1%
15991 178
 
< 0.1%
15919 3485
0.2%
15841 180
 
< 0.1%
15624 1615
 
0.1%
15607 253
 
< 0.1%
15582 153
 
< 0.1%
15454 159
 
< 0.1%
15412 4709
0.2%
Distinct361
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.6 MiB
2025-04-30T19:11:25.796401image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters5747658
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCLE
2nd rowCMH
3rd rowCMH
4th rowCMH
5th rowDAL
ValueCountFrequency (%)
atl 93635
 
4.9%
ord 93278
 
4.9%
dfw 69580
 
3.6%
den 64625
 
3.4%
clt 61946
 
3.2%
lax 60305
 
3.1%
phx 45644
 
2.4%
iah 43890
 
2.3%
sea 43825
 
2.3%
sfo 42540
 
2.2%
Other values (351) 1296618
67.7%
2025-04-30T19:11:26.509801image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
A 635492
 
11.1%
L 538220
 
9.4%
S 470928
 
8.2%
D 456870
 
7.9%
T 320873
 
5.6%
O 306272
 
5.3%
C 292499
 
5.1%
M 250164
 
4.4%
R 235559
 
4.1%
P 233075
 
4.1%
Other values (16) 2007706
34.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 5747658
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 635492
 
11.1%
L 538220
 
9.4%
S 470928
 
8.2%
D 456870
 
7.9%
T 320873
 
5.6%
O 306272
 
5.3%
C 292499
 
5.1%
M 250164
 
4.4%
R 235559
 
4.1%
P 233075
 
4.1%
Other values (16) 2007706
34.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 5747658
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 635492
 
11.1%
L 538220
 
9.4%
S 470928
 
8.2%
D 456870
 
7.9%
T 320873
 
5.6%
O 306272
 
5.3%
C 292499
 
5.1%
M 250164
 
4.4%
R 235559
 
4.1%
P 233075
 
4.1%
Other values (16) 2007706
34.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5747658
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 635492
 
11.1%
L 538220
 
9.4%
S 470928
 
8.2%
D 456870
 
7.9%
T 320873
 
5.6%
O 306272
 
5.3%
C 292499
 
5.1%
M 250164
 
4.4%
R 235559
 
4.1%
P 233075
 
4.1%
Other values (16) 2007706
34.9%
Distinct355
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.6 MiB
2025-04-30T19:11:26.870360image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length34
Median length29
Mean length13.128301
Min length8

Characters and Unicode

Total characters25152329
Distinct characters57
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCleveland, OH
2nd rowColumbus, OH
3rd rowColumbus, OH
4th rowColumbus, OH
5th rowDallas, TX
ValueCountFrequency (%)
ca 208820
 
4.7%
tx 190973
 
4.3%
fl 154482
 
3.5%
il 117881
 
2.7%
chicago 112641
 
2.5%
san 101006
 
2.3%
ny 100596
 
2.3%
ga 99984
 
2.2%
atlanta 93635
 
2.1%
nc 89333
 
2.0%
Other values (432) 3176051
71.4%
2025-04-30T19:11:27.602769image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2529516
 
10.1%
, 1915886
 
7.6%
a 1892431
 
7.5%
o 1419297
 
5.6%
e 1327436
 
5.3%
n 1237775
 
4.9%
t 1211404
 
4.8%
l 1104155
 
4.4%
i 995691
 
4.0%
r 893144
 
3.6%
Other values (47) 10625594
42.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 14045335
55.8%
Uppercase Letter 6501372
25.8%
Space Separator 2529516
 
10.1%
Other Punctuation 2074703
 
8.2%
Dash Punctuation 1403
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1892431
13.5%
o 1419297
10.1%
e 1327436
9.5%
n 1237775
8.8%
t 1211404
8.6%
l 1104155
7.9%
i 995691
7.1%
r 893144
 
6.4%
s 870984
 
6.2%
h 609904
 
4.3%
Other values (16) 2483114
17.7%
Uppercase Letter
ValueCountFrequency (%)
A 804496
 
12.4%
C 754052
 
11.6%
N 543450
 
8.4%
L 508773
 
7.8%
D 344392
 
5.3%
M 331933
 
5.1%
F 325415
 
5.0%
T 308579
 
4.7%
S 290267
 
4.5%
O 274235
 
4.2%
Other values (16) 2015780
31.0%
Other Punctuation
ValueCountFrequency (%)
, 1915886
92.3%
/ 137418
 
6.6%
. 21399
 
1.0%
Space Separator
ValueCountFrequency (%)
2529516
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1403
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 20546707
81.7%
Common 4605622
 
18.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 1892431
 
9.2%
o 1419297
 
6.9%
e 1327436
 
6.5%
n 1237775
 
6.0%
t 1211404
 
5.9%
l 1104155
 
5.4%
i 995691
 
4.8%
r 893144
 
4.3%
s 870984
 
4.2%
A 804496
 
3.9%
Other values (42) 8789894
42.8%
Common
ValueCountFrequency (%)
2529516
54.9%
, 1915886
41.6%
/ 137418
 
3.0%
. 21399
 
0.5%
- 1403
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 25152329
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2529516
 
10.1%
, 1915886
 
7.6%
a 1892431
 
7.5%
o 1419297
 
5.6%
e 1327436
 
5.3%
n 1237775
 
4.9%
t 1211404
 
4.8%
l 1104155
 
4.4%
i 995691
 
4.0%
r 893144
 
3.6%
Other values (47) 10625594
42.2%

DEP_DELAY
Real number (ℝ)

High correlation  Missing  Zeros 

Distinct1358
Distinct (%)0.1%
Missing50351
Missing (%)2.6%
Infinite0
Infinite (%)0.0%
Mean10.802747
Minimum-63
Maximum2941
Zeros91189
Zeros (%)4.8%
Negative1134513
Negative (%)59.2%
Memory size14.6 MiB
2025-04-30T19:11:27.815817image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum-63
5-th percentile-10
Q1-6
median-2
Q37
95-th percentile77
Maximum2941
Range3004
Interquartile range (IQR)13

Descriptive statistics

Standard deviation50.163046
Coefficient of variation (CV)4.6435453
Kurtosis179.39921
Mean10.802747
Median Absolute Deviation (MAD)5
Skewness10.063645
Sum20152903
Variance2516.3312
MonotonicityNot monotonic
2025-04-30T19:11:28.011869image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-5 145399
 
7.6%
-4 138929
 
7.3%
-3 133393
 
7.0%
-2 120925
 
6.3%
-6 117068
 
6.1%
-1 105567
 
5.5%
-7 97253
 
5.1%
0 91189
 
4.8%
-8 75138
 
3.9%
-9 56371
 
2.9%
Other values (1348) 784303
40.9%
ValueCountFrequency (%)
-63 1
 
< 0.1%
-56 1
 
< 0.1%
-55 1
 
< 0.1%
-50 1
 
< 0.1%
-49 2
 
< 0.1%
-48 1
 
< 0.1%
-47 5
< 0.1%
-46 2
 
< 0.1%
-45 1
 
< 0.1%
-44 2
 
< 0.1%
ValueCountFrequency (%)
2941 1
< 0.1%
2672 1
< 0.1%
2209 1
< 0.1%
2064 1
< 0.1%
1959 1
< 0.1%
1840 1
< 0.1%
1742 1
< 0.1%
1690 1
< 0.1%
1674 1
< 0.1%
1663 1
< 0.1%

ARR_DELAY
Real number (ℝ)

High correlation  Missing  Zeros 

Distinct1393
Distinct (%)0.1%
Missing55991
Missing (%)2.9%
Infinite0
Infinite (%)0.0%
Mean5.6487366
Minimum-94
Maximum2923
Zeros34815
Zeros (%)1.8%
Negative1161110
Negative (%)60.6%
Memory size14.6 MiB
2025-04-30T19:11:28.220244image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum-94
5-th percentile-28
Q1-15
median-6
Q38
95-th percentile77
Maximum2923
Range3017
Interquartile range (IQR)23

Descriptive statistics

Standard deviation52.411696
Coefficient of variation (CV)9.278481
Kurtosis153.08975
Mean5.6487366
Median Absolute Deviation (MAD)11
Skewness8.9943426
Sum10506057
Variance2746.9858
MonotonicityNot monotonic
2025-04-30T19:11:28.418825image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-10 52080
 
2.7%
-11 52075
 
2.7%
-9 51957
 
2.7%
-12 51809
 
2.7%
-13 50903
 
2.7%
-8 50422
 
2.6%
-14 49472
 
2.6%
-7 49048
 
2.6%
-15 47970
 
2.5%
-6 46773
 
2.4%
Other values (1383) 1357386
70.8%
(Missing) 55991
 
2.9%
ValueCountFrequency (%)
-94 1
 
< 0.1%
-87 1
 
< 0.1%
-85 2
< 0.1%
-84 1
 
< 0.1%
-83 1
 
< 0.1%
-81 2
< 0.1%
-79 2
< 0.1%
-78 2
< 0.1%
-77 2
< 0.1%
-76 3
< 0.1%
ValueCountFrequency (%)
2923 1
< 0.1%
2649 1
< 0.1%
2206 1
< 0.1%
2050 1
< 0.1%
1928 1
< 0.1%
1865 1
< 0.1%
1726 1
< 0.1%
1707 1
< 0.1%
1685 1
< 0.1%
1652 1
< 0.1%

CANCELLED
Categorical

High correlation  Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.6 MiB
0.0
1864272 
1.0
 
51614

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters5747658
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0 1864272
97.3%
1.0 51614
 
2.7%

Length

2025-04-30T19:11:28.607257image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-30T19:11:28.805550image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
0.0 1864272
97.3%
1.0 51614
 
2.7%

Most occurring characters

ValueCountFrequency (%)
0 3780158
65.8%
. 1915886
33.3%
1 51614
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3831772
66.7%
Other Punctuation 1915886
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 3780158
98.7%
1 51614
 
1.3%
Other Punctuation
ValueCountFrequency (%)
. 1915886
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5747658
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 3780158
65.8%
. 1915886
33.3%
1 51614
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5747658
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 3780158
65.8%
. 1915886
33.3%
1 51614
 
0.9%

AIR_TIME
Unsupported

Missing  Rejected  Unsupported 

Missing56551
Missing (%)3.0%
Memory size14.6 MiB

DISTANCE
Unsupported

Rejected  Unsupported 

Missing630
Missing (%)< 0.1%
Memory size14.6 MiB

OCCUPANCY_RATE
Real number (ℝ)

Distinct545
Distinct (%)< 0.1%
Missing310
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean0.65023395
Minimum0.3
Maximum1
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.6 MiB
2025-04-30T19:11:28.978334image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0.3
5-th percentile0.34
Q10.48
median0.65
Q30.82473862
95-th percentile0.96
Maximum1
Range0.7
Interquartile range (IQR)0.34473862

Descriptive statistics

Standard deviation0.20199945
Coefficient of variation (CV)0.31065657
Kurtosis-1.197604
Mean0.65023395
Median Absolute Deviation (MAD)0.17
Skewness-0.0020310438
Sum1245572.5
Variance0.040803777
MonotonicityNot monotonic
2025-04-30T19:11:29.182274image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.34 27679
 
1.4%
0.78 27671
 
1.4%
0.67 27614
 
1.4%
0.63 27607
 
1.4%
0.8 27532
 
1.4%
0.84 27528
 
1.4%
0.4 27523
 
1.4%
0.55 27513
 
1.4%
0.62 27512
 
1.4%
0.87 27510
 
1.4%
Other values (535) 1639887
85.6%
ValueCountFrequency (%)
0.3 13501
0.7%
0.300755696 10
 
< 0.1%
0.303243726 10
 
< 0.1%
0.303512676 10
 
< 0.1%
0.30376153 10
 
< 0.1%
0.307593844 10
 
< 0.1%
0.308580912 10
 
< 0.1%
0.31 27169
1.4%
0.311331882 10
 
< 0.1%
0.313140949 10
 
< 0.1%
ValueCountFrequency (%)
1 13711
0.7%
0.998723206 10
 
< 0.1%
0.998308907 10
 
< 0.1%
0.995233788 10
 
< 0.1%
0.993263031 10
 
< 0.1%
0.992847434 10
 
< 0.1%
0.99282516 10
 
< 0.1%
0.99 26972
1.4%
0.989766089 10
 
< 0.1%
0.989274599 10
 
< 0.1%

Interactions

2025-04-30T19:11:02.865372image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:10:53.398620image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:10:55.767419image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:10:58.054172image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:11:00.703866image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:11:03.355548image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:10:53.820374image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:10:56.156190image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:10:58.483836image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:11:01.150549image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:11:03.830879image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:10:54.250264image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:10:56.612413image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:10:58.907528image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:11:01.572330image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:11:04.321223image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:10:54.866770image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:10:57.085966image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:10:59.305341image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:11:01.980257image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:11:04.799967image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:10:55.307950image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:10:57.555794image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:11:00.123847image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2025-04-30T19:11:02.403936image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

2025-04-30T19:11:29.357096image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ARR_DELAYCANCELLEDDEP_DELAYDEST_AIRPORT_IDOCCUPANCY_RATEOP_CARRIERORIGIN_AIRPORT_ID
ARR_DELAY1.0001.0000.6490.029-0.0010.021-0.001
CANCELLED1.0001.0000.0160.0310.0020.1000.028
DEP_DELAY0.6490.0161.0000.014-0.0020.020-0.019
DEST_AIRPORT_ID0.0290.0310.0141.000-0.0010.2000.022
OCCUPANCY_RATE-0.0010.002-0.002-0.0011.0000.000-0.000
OP_CARRIER0.0210.1000.0200.2000.0001.0000.200
ORIGIN_AIRPORT_ID-0.0010.028-0.0190.022-0.0000.2001.000

Missing values

2025-04-30T19:11:06.028498image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2025-04-30T19:11:11.315375image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-04-30T19:11:18.121510image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

FL_DATEOP_CARRIERTAIL_NUMOP_CARRIER_FL_NUMORIGIN_AIRPORT_IDORIGINORIGIN_CITY_NAMEDEST_AIRPORT_IDDESTINATIONDEST_CITY_NAMEDEP_DELAYARR_DELAYCANCELLEDAIR_TIMEDISTANCEOCCUPANCY_RATE
02019-03-02WNN955WN459114635RSWFort Myers, FL11042CLECleveland, OH-8.0-6.00.0143.01025.00.97
12019-03-02WNN8686A323114635RSWFort Myers, FL11066CMHColumbus, OH1.05.00.0135.0930.00.55
22019-03-02WNN201LV338314635RSWFort Myers, FL11066CMHColumbus, OH0.04.00.0132.0930.00.91
32019-03-02WNN413WN549814635RSWFort Myers, FL11066CMHColumbus, OH11.014.00.0136.0930.00.67
42019-03-02WNN7832A693314635RSWFort Myers, FL11259DALDallas, TX0.0-17.00.0151.01005.00.62
52019-03-02WNN492WN396014635RSWFort Myers, FL11986GRRGrand Rapids, MI-2.0-8.00.0162.01147.00.49
62019-03-02WNN8634A382614635RSWFort Myers, FL12339INDIndianapolis, IN11.017.00.0143.0945.00.49
72019-03-02WNN210WN401414635RSWFort Myers, FL12339INDIndianapolis, IN-5.08.00.0139.0945.00.37
82019-03-02WNN232WN449214635RSWFort Myers, FL12339INDIndianapolis, IN-9.0-16.00.0135.0945.00.31
92019-03-02WNN7889A489914635RSWFort Myers, FL12339INDIndianapolis, IN0.011.00.0150.0945.00.70
FL_DATEOP_CARRIERTAIL_NUMOP_CARRIER_FL_NUMORIGIN_AIRPORT_IDORIGINORIGIN_CITY_NAMEDEST_AIRPORT_IDDESTINATIONDEST_CITY_NAMEDEP_DELAYARR_DELAYCANCELLEDAIR_TIMEDISTANCEOCCUPANCY_RATE
19158763/18/19AAN837NN143315370TULTulsa, OK11057CLTCharlotte, NC3.0-1.00.0109****0.548402
19158773/19/19AAN840NN143315370TULTulsa, OK11057CLTCharlotte, NC-5.0-9.00.0110****0.862933
19158783/20/19AAN346PR143315370TULTulsa, OK11057CLTCharlotte, NC-3.0-11.00.0103****0.359428
19158793/21/19AAN927NN143315370TULTulsa, OK11057CLTCharlotte, NC-9.0-10.00.0111****0.602012
19158803/22/19AAN893NN143315370TULTulsa, OK11057CLTCharlotte, NC2.0-8.00.0105****0.840772
19158813/23/19AAN903NN143315370TULTulsa, OK11057CLTCharlotte, NC-9.0-6.00.0112****0.794884
19158823/24/19AAN965AN143315370TULTulsa, OK11057CLTCharlotte, NC-2.0-1.00.0106****0.538399
19158833/25/19AAN979NN143315370TULTulsa, OK11057CLTCharlotte, NC-8.0-25.00.0106****0.955579
19158843/26/19AAN872NN143315370TULTulsa, OK11057CLTCharlotte, NC-9.0-6.00.0112****0.595344
19158853/27/19AAN945AN143315370TULTulsa, OK11057CLTCharlotte, NC-8.05.00.0117****0.350192

Duplicate rows

Most frequently occurring

FL_DATEOP_CARRIERTAIL_NUMORIGIN_AIRPORT_IDORIGINORIGIN_CITY_NAMEDEST_AIRPORT_IDDESTINATIONDEST_CITY_NAMEDEP_DELAYARR_DELAYCANCELLEDOCCUPANCY_RATE# duplicates
01/1/19B6N965WN14635RSWFort Myers, FL14307PVDProvidence, RI-5.0-13.00.00.37590310
11/12/19DLN8317M14635RSWFort Myers, FL13232MDWChicago, IL-2.011.00.00.34538910
21/18/19UAN746SW14679SANSan Diego, CA10693BNANashville, TN28.08.00.00.80138310
31/2/19NKN496WN14635RSWFort Myers, FL15016STLSt. Louis, MO-3.02.00.00.69893610
41/2/19UAN782SA14635RSWFort Myers, FL13232MDWChicago, IL5.08.00.00.80190910
51/23/19B6N8730Q14635RSWFort Myers, FL14576ROCRochester, NY-1.011.00.00.55581210
61/31/19AAN7848A13303MIAMiami, FL13342MKEMilwaukee, WI-3.0-1.00.00.93801410
71/8/19UAN8525S14679SANSan Diego, CA10423AUSAustin, TX-2.0-17.00.00.39418310
81/9/19NKN441WN14635RSWFort Myers, FL14679SANSan Diego, CA10.07.00.00.99326310
92/2/19WNN223WN14635RSWFort Myers, FL13232MDWChicago, IL1.01.00.00.66618410